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A NUMERICAL INVESTIGATION OF THE FINITE ELEMENT METHOD IN 
COMPRESSIBLE PRIMITIVE VARIABLE NAVIER-STOKES FLOW 


By 


C. H. Cooke ^ 
SUMMARY 


The results of a comprehensive numerical investigation of the basic 
capabilities of the finite element method (FEM) for numerical solution of 
compressible flow problems governed by the two-dimensional and axis- symmetric 
Navier-Stokes equations in primitive variables are presented. The strong and 
weak points of the method as a tool for computational fluid dynamics are 
considered. The relation of the linear element finite element method to 
finite difference methods (FDM) is explored. 

The calculation of free shear layer and separated flows over 
aircraft boattail afterbodies with plume simulators indicate the strongest 
assets of the method are its capabilities for reliable and acciarate 
calculation employing variable grids which readily approximate complex 
geometry and capably adapt to the presence of diverse regions of large 
solution gradients without the necessity of domain transformation. In all 
cases, numerical results have been in excellent agreement with those 
obtained by finite difference solution of the same physical problems, for 
diverse flows (some with embedded shocks) and a wide range of Reynolds 
mmbers . 

However, for sufficiently complex equations, finite element time marching 
schemes as presently conceived do not appear able to compete economically with 
the better finite difference methods of comparable accuracy. Greater over- 
head may be expected with the method in terms of both computer resources and 
man-year effort. The outstanding weakness of the finite element methodology 
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is the amoTint of computer time -required to produce acceptable problem 
solutions. The greatest future of the method in Navier-Stokes flows 
appears to be with the ^approach of seeking at the outset the solution of the 
steady state equations, where the high overhead per iterative step is offset by 
rapid convergence with a minimiom of iterations. 

INTRODUCTION 

During the past three decades fluid dynamics in the aerospace industry 
has provided a major impetus to the development of methods for the numerical 
solution of partial differential eqixations. Finite difference methods (FDM) 
have been extensively investigated, to the point that the flexibility, versa- 
tility, and adequacy of the various FDM techniques have firmly established it 
as the leading method for, the numerical modeling of complex fluid dynamic 
problems. Somewhat more recently studies have been directed to the investigation 
of the finite element method (PEM) as an alternative tool. The method is 
vmquestionably well respected in the fields of structural and solid mechanics, 
where the dynami'c interactions in complex configurations such as perhaps 
an aircraft wing freime with many easily identifed component parts may be 
modeled by considering each structural member as an element of the finite 
element system, logically as well as physically connected at the nodes of 
the problem. 

However, the modeling of a near inv-iscid fluid continuum by small 
subcontinuua suitably joined at certain points, lines, or planes, does not 
appear as intuitively natural as does the near inelastic solid continuum 
model. Of perhaps greater significance is the general tendency to cumber- 
someness experienced in the application of the PEM technique, which exhibits 
less flexibility towards individual innovation in general and in particular 
in the differencing of specific equation terms than is allowed by the 
customary FDM methods. Indeed, the general manner in which PEM computational 
results have been reported have led some to believe the method has been 
-oversold. As Roache (ref. 1) phrases the call for a more exacting critique 
of the method, "Perhaps it is time for someone to comment on the emperor's 
new clothes . " 
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Here we do not intend to imply that a’ccurate computational results are 
difficult to achieve; this is the rule rather than the exception, and 
represents one of the major strong points of F.EM. However, to quote Sahir 
(ref. 2 ) : "While considerable attention has been given in the published 

literature to efficiency and speed of (linear system) solution routines, 
little or no attention is given to the total time taken by computers in 
producing acceptable solutions . " Although the characteristics of the method 
have not been thoroughly assessed, it appears predictable that, aside from 
possibly potential flow calculations (ref. 3) or in some boundary layer 
problems (ref. 4) , in general FEM results can only be obtained at excessive 
costs. The hindrance of associated time-constming computer runs 
exorbitantly demanding of core storage, auxiliary devices, etc., is 
particularly hard to abide in batch-oriented computer systems fine tuned 
for rapid processing of myriad small jobs. Moreover, it does not seem 
that this deficiency. will be remedied by the next generation of advanced 
computer technology, since the method is in many respects not highly 
vectorizable . 

Perhaps more specifically, it does not appear that in their present 
conception the Fem time marching schemes for asymptotic calculation of 
steady high speed flows governed by the compressible Navier-Stokes equations 
in primitive variable form will ever be economically feasible, in comparison 
to the relatively greater economy provided by the better FDM methods , 
together with stretching transformations to simplify complex geometry. 

For problems whose simplification requires shearing as well as stretching 
in domain transformation, or for problems whose solution is sought at 
the outset by resort to the steady governing equations, the futiire of the 
method may not be as dark. It has been pointed out that the economic 
difficulties characteristic of the method are perhaps due in some respects 
to the tendency to the frontal attack on computational problems by FEM 
practitioners, as customary in the FDM approach. It may be that more success 
with the method in the future will result from taking another look at analytic 
methods such as quasi-linearization or local Prandtl-Glauert approximation 
(refs. 7, 8) together • with iterating nonlinearities, which have in the past 
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not been considered feasible. Certainly for those cases in which the high 
overhead per iterative step is compensated by accelerated convergence rates 
requiring few computational steps to convergence the comparative economy of 
the method is yet to be decided (refs. 3 to 10) . However, the FDM 
practitioners are making rapid progress in these areas as well (refs. 11, 12), 
and FDM could again overstride the FEM results — for” the same reasons, the 
cumbersomeness- of the approach. It could be there are special classes of 
problems such as in transonic flows (ref. 13) or large .scale meteorological 
studies (refs. 6, 13, 14, 16) where FDM practice is not sufficently advanced 
or the mathematical intricacies of the physical problem not well under- 
stood that F^ technology in the hands of an able practitioner will make a 
contribution. In general, these special cases are not too well known or 
as yet await recognition.' 

A CRITIQUE OF THE FINITE ELEMENT METHOD (FEM) 

A comprehensive investigation of the FEM methods, applied to com- 
pressible flows governed by the two-dimensional Navier-Stokes equations has 
been undertaken by the author. Four distinct primitive variable codes have 
been developed. Flow calculations are performed for several problem classes: 
free shear layer flows at Re = 1000 , for fully supersonic (M = 3) and 
mixed subsonic-supersonic (M = 1.68 to M = 3) jet mixing; a imiform flow with 
embedded oblique shock but no recirculation, at Re = 80 and Re = 80,000 ; 
and the mixing and recirculating flow with weak shock (at Re = 12,365) 
represented by the boattail afterbody problem. 

For the free shear layer flow problems in rectangular coordinates the 
constant total temperature assumption employed allowed the energy equation 
to be replaced with an algebraic relationship. The governing equations then 
correspond to mass conservation and momentirn conservation in two components. 
Higher order elements (the C° cubic on triangles) were employed with two 
algorithm classes; 

(a) Implicit time marching code for the unsteady equations (ref. 17) , 

and 
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(b) Block iterative solution of nonlinear systems of algebraic 
equations arising from FEM models of the steady governing equations 
(ref. 5) . 

Linear triangular element quasi-explicit time marching schemes were 
applied to the full Navier-Stokes equations, for the following cases: 

(c) (perhaps) academic calculation in rectangular coordinates of ah 
oblique shock in otherwise uniform flow, using inconsistent (lumped) linear 
element schemes, and 

(d) Calculations in axis-symmetric cylindrical coordinates of the 
boattail afterbody problem with embedded weak shocks and recirculation. 

The purpose of this paper is to discuss some of the assets and 
liabilities of FEM in fluid dynamic application; to sturanarize sortie of the 
findings of this investigation; and to report more fully upon the phase of 
the investigation concerning linear elements and the complete Navier-Stokes 
equations, codes (c) and (d) . 

The Advantages of FEM 

Before consideration of the adverse features of FEM, we consider 
briefly some of its so-called advantages; these are generally considered in 
the literature to be; 

(a) Variable gridworks, allowing economy of grid point allocation; 

(b) Possible triangular elements, allowing complex (non-linear) boundaries 
to be fit readily; 

(c) Higher order elements, allowing greater accuracy; 

(d) Typically easier handling of boundary conditions; and 

(e) Accurate and reliable results from algorithms which leave little 
choice of the way in which individual equation terms are differenced. 

The survey paper of Roache (ref. 1) provides a fairly unbiased 
examination of these characteristics of FEM methods, some points of which 
may be repeated here, together with some additional observations based on 
results included in the present paper. 
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Variable grid capability and fitting of curved boundaries together with 
accurate final results appear to be the greatest assets of FEM. There may be 
classes of particular problems where the cost of these assets is reasonable; 
but such probably do not include the class with geometry amenable to reform 
by means of stretching transformations in one or more variables. Even so, 
to make use of this asset, in addition to the costs already discussed some 
device for automatic choice of the grid point placement is necessary in order 
to eliminate enormous manual labor and human error. The stretching trans- 
formation utilized by Holst (ref. 18) which maps a uniform grid in the computa- 
tional plane into an irregular grid in the physical plane can be considered ideal 
for such: The nonuniform grid so devised is, of course, topologically equiv- 

alent to a rectangular grid for which a mesh generator is particularly simple to 
code. Thus, the grid point selection device is readily automated~and parameters 
easily adjustable to position the grid points where they are required for 
resolution. If such a device is needed to assure proper placement of nodes, we 
are a small step from transforming the equations and obviating the advantages 
(a) and (b) . 

The advantage of higher order elements is not as real as is often claimed 
in FEM literature, since the additional computation induced offsets the reduction 
in grid points. Moreover-, this reduction is not as monvimental as one might 
perceive (going from linear to cubic elements might cut a 1000-point grid 
to one having 600-700 nodal variables without much difference in accuracy) . 
However, from personal experience the cubic element code can be around a 
factor of six or eight times slower than the linear element code. 

Finally, boundary conditions are usually no easier to treat unless 
higher order elements which carry derivatives as nodal point variables are 
used. If such is the case, derivative boundary conditions are no different 
from function boundary conditions, in some cases. Depending upon the 
finite element algorithm involved, some computational boundary conditions which 
are fairly easy to implement in FDM may now cause difficulties in the FEM imple- 
mentation. For example, if we are implicit time marching with symmetric 
positive definite matrices to invert, the application of computational 
boundary conditions such as linear or quadratic extrapolation destroys the 
matrix symmetry and thus leads to more troublesome equation solving- 
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The Adverseness of Finite Elements in Fluid Dynamics 

Perhaps it may be useful to enumerate some difficulties which are to be 
ejq>ected in" computational fluid dynamics. For problems governed by the Navier- 
Stokes equations, several coupled second order partial differential equations 
in two or more space dimensions are to be solved. The calculation of mixing 
or separated flows with recirculation and possible embedded shocks and boundary 
layer regimes in regions with curved boundaries require for economy 
irregular grids. As is usually the case, the several sets of dependent flow 
variables coupled with mesh refinement sufficient for problem resolution 
demands from several hundred to several thousand nodal point variables. 

(For example, a coarse mesh for the boattail afterbody problem discussed 
herein requires 1681 grid points (41 x 41 mesh) and 6724 nodal variables 
(4 equations) to be time marched to steady state.) Fur^hhermore, for problems 
with chemical reactions involved, the number of governing equations increases 
still more, and the equation set of the numerical model is usually stiff. 

For such huge calculations, applications esqierience seems to promote a 
tendency to "the axiom of simplicity: the more complicated the problem from 

the viewpoint of number of equations, number of grid points necessary, etc., 
the less likely is an economically successful solution by means of the 
more refined numerical model ("the implicit schemes with high order of 
accuracy) , due simply to the shear mass of calculations involved, computer ' 
storage requirements, and complexity of the coding. 

Indeed, the advent of the vector computer has indicated that for certain 
problem size ranges the simple explicit time marching schemes with res"trictive 
stability limitations surpass the theore'tically unconditionally stable implicit 
schemes in overall machine economy. Consequently, for time asymptotic marching 
implicit schemes have of late fallen more out of favor, although this is not 
the case for one-space coordinate marching as for the parabolized Navier-Stokes 
flows (ref. 19) . 

It appears from the present study that in general time marching even the 
simpler finite element schemes are more cumbersome to implement and in turn less 
economical than is the average finite difference scheme of comparable accuracy. 
This is due largely to the global nature of the fraction approximation as well 
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as to the more general scope of the FEM method in terms of gridwise applicability 
(even with regular grids the method of programming generally does not deviate 
a great deal from that for the irregular grid) , as well as the averaging 
process the FEM employs in discretizing space derivative terms (ref. 20) . 

Further time-consuming function evaluations result from the need to evaluate 
finite element area integrals by means of quadrature schemes (ref. 15) when 
nonlinear terms are present. Some of the adverseness of the finite .element 
method will now be fxrcther elucidated. 


Relationship Between FEM and FDM Discretizations 

The Linear Element Model . Consider the problem of modeling the 
vorticity transport equation (u and v assumed constant) 
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employing a linear element finite element model. 

bet {(j)^(X/y) :J - 1, 2, n} be the triangular element piecewise 

vJ 

linear shape functions associated with the nodes of a triangulated domain 
(see figure 1) . These shape functions are defined on a triangle with 
vertices at points (x^, y^) , (x^, y^) , (x^, y^) by the equations 
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The finite element model equation at node point J as produced by the 
Galerkin technique is 


8 



dA 


r (j)_ dA 




3q~ 
3y 9y 


( 3 ) 


FEM Equations on a Nonunifonn Grid . Exi)anding q in terms of nodal 

values q in the form 
J 


‘n 

q = Z! qj(t) <j)j(x,y) 


and substituting in equation (3) we obtain the interior point discretized 
finite element equation associated with node point J which will now be 
indicated (refer to figure- 1) . Here 


< { ) > = the corresponding finite element discretization of term 
( ) of equation (IJ , and 

A^ = area of triangle i 

The contributions to the equation at node J are 
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X-diffusion term 
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y-diffusion term 


In equation (5) replace x by y and maintain subscripts, 
q values as written. Here y = Y-r. ~ Y / etc. 


( 6 ) 


X-convection term 
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Y-convection term 


Rewrite equation (7) with x and y interchanged, but 

retaining all q and all subscripts as at present. 
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FEM Equations on Uniform Grid . It is of interest to observe that for a 
uniform grid with 


^F - ’^6 = ■' ' 
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the suitable addition of terms (1-8) produces the discretized finite element 
(central space differenced) equations 
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Here we see that second space derivatives have been central differenced and' 
first derivatives approximated with weighted averages of offset centered 
difference approximations. The inversion of the time derivative coefficient 
matrix before making the analysis would allow the observation that the space 
discretizations are all weighted averages of centered differenced terms. 


Inconsistent FEM 

Consider Roache's question (ref. 1) — what is a finite element method? 

From equations (4 'to 8) we can see in this instance that a Galerkin method 
employing linear elements produces a space discretized finite difference 
scheme which, is second order accurate and which possesses arbitrary variability 
in grid point location. Of course, for good results the triangles should 
not be too distorted, with in particular large obtuse angles not allowed, 
although small angle restrictions are not as significant as originally 
supposed (ref. 21) . For uniform grids the method analyzed' applies centered 
space differences to diffusion terms and weighted averages of such differences 
to convection terms. 
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When applied to transient problems the method is inherently implicit. 

It produces systems of ordinary differential equations coupling nodal 
variable time derivatives which are of the form 

B ~ = f(Q) (10) 

Should one wish to numerically integrate employing explicit schemes, 
equation (10) presents a matrix inversion barrier at the outset, since B is 
banded and of significant bandwidth. 

This implicitness can be avoided (observe in eqtiation (4) the result 
of Taylor's e:q>ansions about the central point A) by lunping B to obtain 
an easily inverted diagonal matrix, according to the relations 


D. . = 
XX 


E 


B. . 
11 


D. . 
11 


0 


i j • 


On a uniform grid the lumped system 


dt 


D-1 f(Q) 


( 11 ) 


(12) 


is consistent except near the boundaries. For a nonuniform grid the lumping 
process globally lowers to first order the accuracy of the transient solution. 
However, for uniform grids this occurs only near the boundaries. 

This deterioration in accuracy on nonunifo 2 nn grids is a compelling 
argument against lumping. For this reason the technique of simplifying 
geometry by stretching transformations with uniform, mesh FEM applied in the 
transformed plane gains some favor. However, why do the discretization with an 
FEM,, considering its chief advantage is complex geomet 2 :Y capacity and that 
furthermore it always applies centered differencing to convection terms? 
Historically, the raison d'etre for the evolvement of the Lax-Wendroff type 
method is to obtain second order accuracy while at the same time preserving 
the stability characteristics of the upwind techniques. As traced by 
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Roache (ref. 22) this historical evolution involved successive improvements 
over the forward time central space method via DuPort Frankal, the upwind 
methods, Lax-Wendroff methods, in that order. The cumber someness of applying 
upwind differencing with FSM seems to exclude such a similar chain of 
evolution; this type of differencing does not appear to be naturally 
inherent in FEM discretizations (ref. 20) . 

Extra Stiffness of FSM Systems 

Consider equation (9) with convection terns omitted. A Von Neuman-Fourier 
resolution 


r(x,y,t) = C(t) expCK^x + ny) ] / i = 


(13) 


then produces the amplitude evolution characterized by 
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dt 
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Here 5 , ri are frequency related’, a results from time derivative coupling. 
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Letting 


A'=aA., A = C = i1/ Ax = Ay, 0 = AAx , 
we may deduce that 
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Hence the time cJerivative coupling has produced a stiffer system by a 
factor of approximately 3.5 than would an ordinary FDM method with centered 
space differencing. Considering the assumptions of the analysis, this is 
in excellent agreement with the experimental observations of a factor of three 
or four (ref. 23). 

In terms of stability ‘ implications the forward time centered space FDM 
method applied to the diffusion equation is restricted by (Ax = Ay) 


8 = 



(19) 


whereas forward time differencing of equation (9) restricts the resulting FEM 
algorithm according to 


3 



( 20 ) 


Of course, the same considerations apply to equation (1) , with extra 
complicatiqn . 

Storage Considerations 

It is clear the implicitness of FEM methods, if tolerated by avoiding any 
sort of lumping, will require extra storage. What is perhaps not so obvious 
is the storage cost resulting from the capability of totally irregular grid 
point location. The shape of triangular elements can be used to advantage in 
accurate fitting of curved boiondaries; hence, transformations are not 
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essential in this respect. However, some penalties for this generality are 
necessary; most notably, the increased grid disorder requires increased 
storage of information concerning grid character, such as node coordinates and 
their triangle associations. 

For example, on a 41 x 41 grid with 3200 triangular elements the storing 
of 4 bits of information per triangle (node numbers of three> vertices and 
boundary triangle indicator) requires 12800 storage locations. (In contrast, 
on an FDM mesh the reference point at which a difference equation is to be 
written is located by the I,J indices inside a nested DO loop, with index 
incrementation for determining neighbor points; consequently, no storage of 
grid information of the above type is necessary.) Moreover, indication of 
whether each node is unrestricted (is there an ecjuation to be written or 
not?) requires 6724 bits of information, on the 41 x "41 grid with_4 dependent 
variables per node. 

For the implicit case 

B ^ = f (Q) , (21) 

with linear triangular elements and the 41 x 41 grid B is a symmetric 
banded matrix of bandwidth 83, of dimension 1681 x 1681. Moreover, one such 
system is required for each of the four fluid dynamic dependent variables . 
Problems of this size necessitate out-of-core solving with associated 
buffer needs. For this grid the split-band Cholesky method (ref. 24) 

(see Appendix I) requires a buffer of minimum size 9000 words for inversion 
of B , with 4 X 1681 = 6724 storage locations for f . Out-of-core 
frontal methods require less storage (refs. 25, 26) , but would seem more time 
consuming on a per step basis since their implementation would necessarily 
require inversion of the matrices involved every time step. Frontal 
utilization would appear to have as a corequisite a fully implicit time 
integration procedure. This appears vastly more complex to implement than 
the split-band Cholesky method in which B is inverte_d only once with 
subsequent front and back solves utilized whenever is needed. 
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The picture emerging is that finite element implicitness and variable • 
grid generality tends to cumbersome application and more heavy core storage 
requirements than is characteristic of finite difference methods (with • 
MacCormack ' s method on the lower end of the spectrum and ADI the higher end) . 

As regards the cumbersomeness of FEM, the device of time step doubling 
(ref. 18) which cuts total computer time a factor of 2.5 to 4 with MacCormack 's 
method, would be difficult or impossible to apply with quasi-explicit 
nonliimped or fully implicit FEM, due to system nodal variable coupling. 

Time Considerations 

It is clear from equations (4 to 8) that equation assembly time for the 
variable grid difference scheme probably should exceed by an order of magnitude 
that of a uniform grid scheme, particularly for the FDM technique_ where one 
numerical coefficient may receive from two to six contributions from 
separate sources. Furthermore, the estimate represented by this academic 
example is clearly optimistic; for the primitive variable Navier-Stokes 
problems practically every equation term exhibits nonlinearity, often to the 
extent that the algebraic manipulation impedes or the type of nonlinearity 
prevents a priori tabulation of the simplest form of the expression evaluated. 
Such evaluation is accomplished by calculating area integrals over elements 
of the geometrical domain by quadrature schemes (which with higher order 
function approximation can involve several quadrature points per element, if 
one is not to degrade the order of accuracy otherwise e:sq»ected (ref. 15) ) . 

Note from equations (4 to 8) that each additive term in an equation 
coefficient represents a contribution from a separate triangular element. 

This is the key to the FEM equation assembly concept; one proceeds triangle 
by triangle computing . contributions to all equations associated with nodes 
of each triangle, adding each contribution into the proper place. It is 
this equation assembly process v;hich consumes the greatest relative 
computation time through function evaluation for the numerous contributions 
to one coefficient. 

When the governing equations are largely composed of linear, terms , 
as for stream function-vorticity formulations or some potential flow 
situations, with troublesome a priori algebraic manipulation the matrix 
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element contributions can be reduced to simplest possible form for rapid 
evaluation, as done by Bratanow (ref. 23) and Baker (ref. 4). Such a scheme 
represents some improvement over the machine quadra txire approach; however, it 
is clearly impractical for higher order elements and/or highly nonlinear 
equations such as primitive variable Navier-Stokes. In any event this algebraic 
manipulation represents further c timber someness in application. 

Here one is impressed that even though geometric stretching trans- 
formations may increase the ntimber of terms in the governing equation set, 

depending upon the complexity of nonlinear coefficient evaluations the resulting 

✓ 

FDM application on a uniform grid appears should have much leeway in central 
processor time (due to simpler equation assembly) over FEM applied on the 
original domain. Moreover, the FDM transformation approach exactly accounts 
for boundary curvature, while FEM boundary approximation by triangular 
elements actually introduces possible second order roughness effects. 

FINITE ELEMENT MODEL OF NOZZLE AFTEEBODY VISCOUS 
INTERACTION EFFECTS WITH PLUME SIMULATION 


Flow separation has serious consequences in many fluid dynamics 
applications. Once separation occurs at the boundary of a submerged body, 
the resulting flowfield behavior departs radically from that predicted by the 
inviscid flow theory, because now the separating stream causes effectively the 
formulation of a new flow boundary. This added irregularity of the effective 
body geometry due to flow reversal and recirculation with resulting downstream 
turbulent wake can produce a relatively large increase in total vehicle drag. 

Current interest in reducing aircraft drag is evidenced by the appearance 
in the literature of large amounts of e^erimental data and investigative 
study of various features of the problem. The difficulty of ejqierimentally 
predicting the flow characteristics of a particular body geometry, due to 
the hot jet exhaust plvime and Reynolds number scale effects, points to the 
need for accurate mmerical simulation procediores. Calculations for the most 
general free plume case are not known to the author. However, for certain 
nozzle pressure ratios the plume simulator (see figure 2) produces results • 
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close to free plume results (ref. 18) . The simulated flow field contains 
many of the same features as the free plume flowfield; separation and re- 
circulation; separation shock; turbulent boundary layer on the boattail; and 
a turbulent recirculating region. 

For the plume simulator MacCormack's FDM method has proven to be a notably 
successful numerical simulation device (ref. 18) . It provides a computation- 
ally feasible tool which allows insight into the viscous/inviscid problem 
interactions, without the encumbrances characteristic of the weak interaction 
techniques which iteratively calculate separate solutions in the boundary 
layer and inviscid flow regimes, with consequent matching difficulties at 
common boundaries. The flow and geometry characteristics of the problem as 
well as availability of FDM results make -it a good test case for comparative 
performance of FEM/FDM technology. 

The subsequent sections of the present paper are addressed to the develop- 
ment of FEM algorithms for the solution of viscous compressible flow problems 
with possible embedded shocks and recirculation regions. As a logical first 
step in the’ development of such a computer code we consider the calculation in 
Gatesian coordinates of unifora flow on a rectangular region' which 
encotmters an embedded - oblique shock with known turning angle. Having control 
at -the boiandary of -the location at which the shock is introduced allows 
fairly accurate knowledge of where the shock should form (see figure 3) . 

The code so developed then allows reasonably simple modification for 
computation of the boattail, pltrae simulator problem, couched in axis- 
symmetric cylindrical coordinates. In both cases laittinar flow is assiamed; 
however, simple eddy viscosity or two layer turbulence models could be 
easily in’troduced. 


The Navier-Stokes Equations 

The equations governing the flow of a compressible viscous fluid in 
the absence of body forces and electromagnetic effects can be written in 
weak conseraation law form as follows; 


3U 3F . 3G 

3t 3x 9y 


H = 


0 


( 22 ) 
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Nondimensionaliaing the governing equations using free stream density 
(P „) t free stream x-direction velocity (u ) , boattail exit diameter (D ) , and 

CO Q 

reference temperature 


T 


Ref 



(23) 


the following dimensionless equations hold; 


Cartesian Coordinates 
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Here the viscous stress relations are 
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( 28 ) 


and the heat flux components are 


^x P„R 3 x 


ky 9 t 

R e 


_ _ ky 3 t 

^ P„R 9y 
Re 


(29) 


The nondimensionalized constitutive relationships are Sutherlands viscosity 
law 


■■ 


(30) 


the perfect gas law 


P = (y - l)pT ; 


(31) 


and the specific total energy definition. 


E 


. u^ + v^ \ 

= p^T + — ^ j . 


(32) 
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Also 
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k = thermal conductivity 
= Reynolds number 
= Prandtl's number 
p = density 

u = x-component of velocity 
u = y-ooraponent of velocity 
p = static pressure 
T = temperature 
Y = 1.4 
]i = viscosity 

Axis-syitanetric Cylindrical Coordinates 

Using the previously described nondimensionalization , with x the 
axial and y the radial direction, the equations completing definition 
of the governing equations in axis -symmetric coordinates are given below. 

The constitutive relationships are as previously defined, as well as the heat 
flux ejqjressions . However, the following equation changes are necessary: 
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U = yU 

0 



G = yG 

■ ^ + '"00 

F = yF 

,0 



( 33 ) 


Viscous stresses 



(34) 


The Galerkin FEM Equations 

The first step in the FEM discretization of equation (22) is the 
triangulation of the computational domain D with bounda 2 ry P . Spatially 
piecewise linear function approximation is accomplished by associating shape 
functions of the form indicated by equation (2) with each vertex of a 
triangle. Trial functions are given by 

N 

U(x,y,t) = V U_(t)<f>_(x,y) , (35) 


where u is the four component vector specifying the value of the vector 

u 

(23) at the J-th node, and N is the total number of nodes. 

The value of Uj(t) is determined using the Galerkin technique by 

forcing for every node at which a component of U is not specified 

J 
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by a boundary condition the corresponding component of the (Galerkin) 
equation 


rr * 30 rr [ * 1 

///j a? ^ ° ar ■ V J 

(36) 

+ r <l> [Gdx - Fdy] 

F J 

to hold. This yields four coupled systems of ordinary differential 
equations 

S^i 

[A^3 f(q^) , i = 1, 2, 3, 4 (37) 

to be time integrated. Here 


, - // Vk ® 

and q^ is the collection of all nodal values of component i of the 
vector (23) which are allowed to vary with time- 

The time integration of equation (30) is accomplished using the explicit 
self-starting maximally stable predictor-corrector algorithms of reference 
27. For the oblique shock calculations the matrices of equation (30) 

were lumped by the procedure of equation (11) to avoid the inversion. 
Integrals over a triangle arising from the left member of equation (36) 
are a priori exactly evaluated as 


// dA = 

Tl 



J = K 
J ^ K 


(39) 
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where A is the area of triangle . The integrals over triangles of 
the right member nonlinearities are evaluated by one point quadratures, with 
two point quadratures applied to the boundary integral terms. For the 
boattail afterbody calculations on a highly uniform grid (see figure 4) the 
inversion of the matrices in equation (30) is accomplished by developing 
a special purpose out-of-core equation- solving routine, the split band- 
Cholesky solver (see Appendix I) . 

A fourth order damping scheme' is used to smooth oscillations resulting 
from shocks or other large flowfield gradients. The additive damping 
corrections employed are described in reference 18. 

Boundary Conditions 

The computational domain for the oblique shock calculation is shown in 
figure 3. Uniform flow conditions .are prescribed' on the inflow and prior 
to the point of introduction of the shock between nodes six and seven on the 
top boundary, including point seven and along the remainder of the top 
botindary altered uniform flow conditions corresponding to the flow having 
been turned by ,a shock at an incidence angle of 23 degrees are prescribed. 
Computational boundary conditions are applied along the bottom and butflow 
boxondary; zero normal gradient (£2 = fi) along the bottom and linear extra- 
polation (f 3 = 2 f 2 - fi) on the outflow. 

The computational domain for the boattail afterbody problem is shown 
in figure 4. The points of the rectangular appearing grid were obtained as 
in reference 18,. where -separate stretching in the two coordinate directions 
was employed in mapping the points in the physical plane (also the 
computational plane for the PEM calculation) onto the nodes of a uniform grid 
in the (transformed) computational plane. The parameters of the stretching 
mappings were chosen to concentrate more points along the wall and in the 
region of recirculation. The physical plane gridwork so obtained was 
triangulated for FEM purposes by drawing in the left-to-right and bottom-to- 
top diagonal of each rectangle. Thus, the mapping yields a readily automated 
grid generation technique. 
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The boundary conditions along the upper and outflow boundaries utilize 
linear extrapolation of interior point data for the variables p , u , v , 
T with S determined from eqpaation (32) . Along the wall pressure is 
linearly extrapolated; the no-slip condition specifies u,v ; and 
T = is specified; p , E then depend upon ^ , T as specified by 
equations (31 to 32) . (The validity of the linear extrapolation requires 
placement of the top and outflow boundaries sufficiently far removed from 
the boattail curvature, with the upper boundary essentially in free stream.) 
The flow conditions on inflow employ profiles obtained from a combination 
method of characteristics/boundary layer solution as described in 
reference 18. 


NUMERICAL RESULTS 


Oblique Shock Flow 

The main purpose of the oblique shock calculations is to demonstrate 
for a problem with known solution the correctness of the finite element 
code and its applicability to flows involving shocks. Calculations are 
performed with Mach number M = 3 in the uniform flow, Prandtl number 
=5 0.72 and Reynolds numbers of 81 and 31,000. Table 1 shows the 
test cpnditions used to simulate the introduction of the shock between two 
grid points on the upper boundary. For the low Reynolds number, weak shock 
(R^ = 80.869) convergence was achieved in around 150 steps on a uniform 
21 X 31 grid with Ax = .05 , Ay = .0333 and At = .01 . The criterion 
for convergence of calculations herein discussed is 



for all components of U . No artificial viscosity was added. 

For the crisp high Reynolds number (R^ == 80,869) shock mesh refinement 
became necessary. Since the shock did not penetrate the bottom half of the 
previous flow domain, this lower half was truncated and a 31 x 41 grid 
(Ax = .0333, Ay = -0125) imposed on the top half. Applying artificial 
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Table 1. Conditions for Oblique Shock Simulation. 


Uniform Flow Conditions 

M = 3.0 
p = 1.0 
u = 1.0 
• U = 0.0 

T = 0.19841 


Post Shock Conditions. 

M = 2.75799 
p = 1.29341 
u = 0.96537 
n = -0,08159 
T = 0.22034 



.1 


viscosity of strength C = -1 , C = ,1 (see reference 18) to damp spurious 
oscillations arising from the crispness of the shock, convergence was achieved 
in 1000 steps with At = • 001 . 


Results for the high Reynolds number case are presented in figures 5 (a to d) , 
through comparisons between FEM and MacCormack's FDM method (ref. 18), For the 
FDM calculation the original 21 x 31 grid was employed; hence, the exaggerated 
overshoot exhibited in some' instances by the FDM results should not from the 
present data be considered a liability of the method, but is probably the result 
of mesh coarseness. Figure 5(e) shows pressure contours over the computational 
domain. Note that pressure values have been multiplied by 1000. The roughly 
one percent error fluctuations in pressure which appear in the uniform flow in 
front and back of the shock are attributed possibly to the effects of lumping 
boundary error propagating downstream, or else to the error criterion not being 
strong enough to assure convergence of the calculation at the time of 
termination. 


The introduction of the high Reynolds nvunber shock at the top boundary is 
accomplished by imposing uniform flow conditions on the inflow and at every 
top boundary grid point through point six, with post shock values at point 
seven and thereafter. Hence, due to the linear element model the shock 
behavior along this boundary did not exhibit actual jump discontinuity; but 
rather, linear ramp function transition over the interval .1667 _< x ^ .20 , 
Similarly, due to the coarser grid the corresponding transition for the FDM 
calculations occurred over the larger interval .15 x .20 , Effectively, 
neither calculation pinpoints shock intersection with the boundary, but both 
model the proper jirnip conditions across the smeared transition region' 
corresponding to a shock at 23 degrees angle of incidence. 

By drawing a straight line at 23 degrees angle of incidence and intersecting 
the top boundary at x = . 1667 , the intersection of this line with the grid 
lines X = should yield a y-value in close proximity to the theoretical shock 
jump location. The level line on figures 5 (a and b) in which pressure and 
density ratios across the shock are plotted indicates this location. Hence, 
the calculations appear in good agreement with theoretical shock jump locus 
to within the limits of the accuracy with which this location can be specified. 
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Boattail Afterbody Flow With Separation 

In reference 18 numerical solutions have been obtained for axis- 
symmetric boattail bodies with solid sting plume simulators, for several 
geometric configurations mostly from the family of boattails with circular arc 
generators. The accuracy of MacCormack's method has been verified for this 
class of computations by comparisons with wind tunnel results. Thus, one can 
confidently prove the FEM calculations by comparison with the corresponding 
PDM results. 

The configuration chosen for the present calculation is as follows: the 

streamwise length of the computational domain is x^^^ = .688 m ; exit diameter 

is D = .0914 m ; boattail radius of curvature is .894 m;' boattail length is 

e 

.229 m; boattail angle is 8 = 14.8 degrees. The inflow boundary is placed one 
maximum diameter of .159 m upstream of the beginning of boattail curvature, to 
escape feedback upstream (through the boundary layer) of- boattail expansion 
effects. Grid point 8 'in the x-direction is the first and grid point 26 the 
last grid point associated with boattail curvature. Characteristic parameters 
for the flow calculation are free stream condition = 1.3 , total temperature 
T = 340®K , Y = 1-4 , Prandtl number p = 0.72 , reference Reynolds 

U/® R 

number R^ = 12365 , reference' temperatxare ~ 432,883 , wall temperature 

T = 311.4'’K , s = .4 . 
w 

In order to reduce central processor time requirements in the calculation 
of a typical boattail flowfield the FEM calculation was initialized with output 
from the MacCormack code which had been allowed to run several hundred steps. 

Even so, on the coarse mesh (41 x 41) the FEM program -required another 400 steps 
to converge, with time step .0002. (Further calculations beginning with a flow- 
field obtained by reproducing the inflow at all downstream locations are landerway, 
and will be reported if completed in this investigation period.) 

Figures 6 (a to e) exhibit comparisons of density, velocity, pressure and 
temperature profiles at various downstream stations, for the two computational 
methods. All profile .stations except the first occur within the region of 
recirculation which is clearly exhibited. The FEM results do not seem to 
resolve as much of the finer details of the flow as the FDM results, although' 
the overall disagreement between the two is not significant. Perhaps it may 


28 



be of interest to note that in the expansion region where the FEM physical 
plane grid is most irregular the disagreement is not as significant as in that 
portion of the recirculation region extending onto the plume simulator. Hence, 
the transformation approach with regular mesh calculations in .the transformed 
plane appears to be able to more uniformly account for significant geometric 
transition effects. However, these calculations do appear to establish that 
the FEM method can accomplish well the job required in the original geometry. 
Hence, the future of the method, if any, in transient calculations, may lie in 
those areas where the transformation approach encounters difficulty; e.g., 
where the simple stretching transformations are not sufficient. Here one could 
envision channel flows with significant channel curvature and irregularity in 
cross section which varies with flow direction. 

Comparisons of Computational Efficiency 

In this section are presented some efficiency comparisons for cases in 
which a particular physical problem has been solved numerically by both FEM 
and FDM algorithms. No claims are made that the respective computer codes 
were as highly optimized as possible, or that different paths of algorithm 
design might not have produced varied results. Nor is the data for comparison 
as extensive as might be desired due to the general expense of generating 
finite element results. However, one might expect this data to predict 
the general characteristics typical of FEM/PDM resource economy. 

Table 2 exhibits comparisons of particular indicators of computational 
efficiency. The methods studied together with references docxmenting precise 
algorithm formulation ■ are as follows: for free shear layer flows (at 

R = 1000) consistent implicit FEM algorithms for marching solution of the 
time transient (ref. 17) and nonlinear nontimelike block iterative solution 
(ref. 5) of the steady Navier-Stokes equations are compared with an alternating 
direction implicit (ADI) FDM. method which. employs upwind differencing of 
convection terms (ref. 28) . Likewise, the consistent linear element FEM 
algorithm herein discussed is compared with MacCormack's nonsplit method 
which employed time step doubling (TSTD) (ref. 18) , applied to boattail after- 
body calculations . 

In all cases exhibited it may be emphatically concluded the FEM structure 
is far greater demanding of core storage and time per computational step 
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Table 2. Comparison of FEM/PDM Performance Characteristics 


Method 

Equation 

Dynamics 

Number of 
Governing 
Equations.’ 

Flow 

Class 

Total 
Problem 
Unknowns 
(Humber of 
Nodes* Number 
of Equations) 

Storage 

Ka 

Total Unknowns 

Cpu Time* 
Sec/Step/ 

T'otal Unknowns 
(CDC-6600 

Computer 7‘lme Units) 

Apparent 
Maximum 
Step Size 

Steps to 
Convergence 

C" Cubic 

Transient 

3 

Shear 

Layer 

1834 


.0763 

.03 

150 

ADI 

Transient 

3 

Shear 

Layer 

3000 


.00125 

.3 

30 

C° Cubic 

Steady 

3 

Shear 

Layer 

1084 


.0545 

NA 

15 

Consistent Linear 
Elements 

Transient 

4 

Boattail 

6724 

.033 

.00170 

.0006 


Honsplit 
HacCormack with 
TSTD 

Transient 

4 

Doattall 

6724 

.011 

.00031^ 

.0046 

650 coarse 
stops 

Lumped Linear 
Elements * 

Transient 

4 

Boattail 

10404 

.0134 

.00108 

NA 

NA 


* Data extrapolated £rom consistent linear element case by sid>tractlng equation solve time from total step time, and buffer 
storage from total storage, 51 x 51 grid. 

^ All boattail calculations were performed on the CYBER 175 computer j all others on the CDC-6600' computer . To obtain 

equivalent cpu times for comparison, CYBER units of time have been converted to CDC-6600 units by employing a muillpllcatlve 
factor of 2.5. 

^ Total cpu time to convergence divided by total numljer of coarse steps required. 



than is the FDM algorithm and appears to require a smaller step size. Of 
course, the most conclusive efficiency comparison that one can initiate is 
the measurement of • total problem solution time, as dictated by -maximum 
step size and number of steps to convergence. In this respect MacCormack's 
method with TSTD required only 650 coarse steps and 544.5 seconds of cyber 175 
computer time for convergence on the (R = 12365) boattail afterbody problem 
with a 41 X 41 grid. Here one coarse step is equivalent to several fine 
steps (ref. 18) . - Present indications are that the consistent linear element 
FEM code can run at a step size in the range .0004 to .0006. Knowing that 
TSTD cannot be used to speed the computation one can at best project an 
efficiency defect in total computation time of above an order of magnitude. 

The best showing for the FEM method occurred for the free shear layer 
code formulated at the outset to solve the steady governing equations. Here 
the excessive time requirements on a per iterative step basis are offset by 
only a few steps being needed for convergence. However, even this performance 
was still not competitive with the ADI performance (ref. 28) . 

The per step cpu time requirement for the consistent linear element 
code was extrapolated to that needed for a lumped code by subtracting matrix 
inversion time; notably,, forty percent of the total time needed per step. 
Hence, it appears that a lumped algorithm applied on a stretched domain could 
be as efficient as the consistent code on the original domain (or better) . 

CONCLUSIONS 


The results of comprehensive numerical investigation concerning the basic 
capabilities of the finite element method as a tool for numerical calculations 
related to compressible flow problems indicate that rt provides an acc^lrate and 
reliable solution technique. However, the inherent ease with which variable 
grids and complex geometries can be handled is offset in most cases by a 
cumbersomeness in application which (in comparison to the better FDM methods) 
hinders economic calculations to the extent of rendering the method unfeasible, 
particularly for time marching solution of the more complex aerodynamic flow 
problems . 
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Fully ejcplicit FEM algorithms on nonuniform grids are hindered by order 
of accuracy estimates, while due to spectrum broadening stemming from time 
derivative coupling quasi-e^^licit schemes are hampered by more rigid 
stability restrictions as well as matrix inversion at the outset. Such 
difficulties can be greatly alleviated by yielding to inherent FEM implicit- 
ness through use of implicit numerical integration, at the cost of matrix 
assembly and inversion each time step. Due to core storage requirements, 
for larger problems this approach is not at all feasible unless frontal 
equation assembly and solving is employed. It is hard to believe that 
such an approach would be competitive with the better finite difference 
-methods', in particular since it has been observed in shear layer calcu- 
lations that implicit methods can be in practice time step restricted to 
near the explicit CFL limit. However, frontal solution appears the 
most economical route to follow if the FEM method is employed as a time 
marching scheme. Of .course, one might expect further increases in efficiency 
by time splitting the equations and applying one-dimensional finite elements 
to obtain FEM-ADI algorithms. However, the next question to consider is 
whether by such an approach the variable grid-complex geometry capability 
might not- be weakened. It thus -appears the, greatest future of the method 
lies in problem areas where solutions of the steady governing equations are 
sought at the outset. Here the more rapid convergence expected of nonlinear 
nontimelike mathematical iteration processes appears to offset the high 
per step overhead of the method tending to more nearly provide economic 
competitiveness wiuh FDM results. 

The following major points have been established by this investigation: 
the strongest assets of the finite element method are its (a) capabilities 
for reliable and accurate calculation employing variable grids which 
readily approximate complex geometry and (b) capably adapt to the presence 
of diverse regions of large solution gradients. Complex flows with embedded 
shocks and regions of recirculation can be accurately calculated without the 
necessity of domain transformation. However, for sufficiently complex 
equations FEM time marching schemes as presently conceived do not appear able 
to complete economically with the better FDM algorithms of comparable accuracy, 
due largely to the sheer ntimber of computations involved. Greater overhead 
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may be expected in terms of both, computer resources and man-year efforts 
required to produce auxiliary routines such as grid generators and out-of-core 

V 

solvers. Some inflexibility is associated with the method; e.g. , upwind 
differencing is not naturally achievable with the general element, and time 
step doubling would be difficult or impossible to perfom. Finally, the 
greatest weakness of the finite element approach is the total computer 
processing time and storage required for problem solution. 
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Figure 2. Boattail plume simulator flow field. 
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Figure 4. 


Boattail computational domain, with irregular triangular element formation. 
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(a) Pressure jump. 



(b) Density ratio across shock. 

X = . 35 X = . 60 X = . 90 


Figure 5. oblique shock FEM-FDM computational results (Re = 80,869). 



o Inconsistent (lumped) linear element FEM 
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(c) Streamwise velocity profiles. 



(d) Normal velocity profiles. 
x = .35 x = . 60 x = .90 


Figure 5. Continued. 
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(b) Pressure profiles. 


Figure 6. Continued. 
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APPENDIX I 


A SPLIT BAND-CHOLESKI EQUATION-SOLVING STRATEGY 
FOR FINITE ELEMENT ANALYSIS OF TRANSIENT FIELD PROBLEMS 

ABSTRACT 


An efficient strategy is outlined for out-of-core solution of the large 
systems of -equations which specify nodal point time derivatives in finite 
element models of transient flow problems. The positive definiteness, 
symmetry, and band structure of the finite element mass matrices, as well as 
the nature of the equation assemblage process, are exploited by the method. 
Computational results are indicated for systems on the order of several 
thousand unknowns in size. 


INTRODUCTION 


A major drawback of finite element as opposed to finite difference 
models for transient flow problems is -that in order to obtain consistent 
space discretized equations one invariably introduces time derivative 
coupling of the state variables. For example, space discretized Galerkin 
models of fluid dynamic interactions governed by the Navier-Stokes equations 
lead to coupled initial value problems of -the form 


A = f(Q) , t ^ to 

Q(bo) = io 


(A-1) 


In finite element terminology A is the mass matrix of the system (and 
therefore, symmetric positive definite and time invariant, regardless of 
physical governing equation characteristics or element type) ; Q is the 
vector of nodal displacements; and f is a nonlinear function of Q and 
boundary constraints. 


-Preceiliitg pag^ 


47 



Depending upon the' characteristics of the problem and background of the 
investigator, the inherent implicitness of equations (A-1) has been treated 
in various ways. Early proponents of implicit time marching schemes such 
as the Crank-Nicholson Galerkin method (ref. 29; , intuitively reasoning matters 
cannot be worsened by so doing either attack equations (A-1) directly with 
implicit numerical integration schemes, or else retreat within the Galerkin 
integral and time discretize implicitly, with resulting marching equation 


B 

n 


“n+l 



n 

past 


(A-2) 


In such cases the matrices need not be positive definite or even 

symmetric, which complicates the choice of linear system solver. 

On the other hand, a direct attack on the system (eqs. (A-1)] by 
application of classical explicit methods for numerical integration or 
ordinary differential equations is. hampered by the derivative coupling. 
Assuming lumping is an unacceptable or unworkable alternative (consistency 
in the transient is desired and/or higher order elements on nonuni-form 
grids are employed) the tandem matrix inversion required results in what • 
shall be termed a quasi-explicit time marching shceme. Taylor and Davis 
(ref. 30), possibly motived by the success of MacCormack's method in nvimerical 
fliiid dynamics, have experimented with predictor-corrector methods. A typical 
example is the quasi-ej<plicit modified trapezoidal (lowest order Runge-Kutta 
scheme) 


A Q. 


f 

n 


^n+1 "" ®n *2 ^^n+1 ^n^ 


(A-3) 


where x = t -t , f =f(Q), and f* = f(Q*) . A. J. Baker . (ref. 31) 
n+l n n n 

has achieved some degree of success with quasi-explicit techniques based upon 
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maximally stable 3-stage one-step self-starting predictor-corrector algorithms 
(ref. 31). 

The relative efficiency of implicit versus quasi- exp lie it approaches 
is not decided here. Suffice to say it appears the implicit methods should 
have an edge, since they are usually absolutely stable, hence have no severe 
time step limitations. However, for hyperbolic or mixed problems the physics 
of the flow naturally restricts the time step via the Courant condition 
(CFL-time-step restriction for explicit finite difference schemes (ref. 22)). 

This degrades somewhat the advantage of absolute stability usually associated 
with such schemes. For instance, computational results (ref. 28) indicate the 
theoretically stable alternating direction implicit (ADI) finite difference 
method has to be time step restricted to near the explicit CFL limit in order 
to achieve convergence in the calculation of supersonic compressible free 
shear flows. 

As a consequence, it appears the time step advantage of implicit over 
quasi-explicit methods may not in all oases be as monumental as at first 
perceived. The question of how much this advantage can be further offset 
hinges upon the relative times for assembly of the equations involved, the 
core storage requirements resiiLting from program complexity as well as buffers 
for matrix inversions, and execution speeds for the equation solving. The fact 
that the mass matrix in equations (A-3) is time invariant and symmetric positive 
definite enables inversion by the Cholesky method, requiring matrix decomposition 
only once if axixiliary storage of the triangular components is used, with 
subsequent front and back solves whenever is required. It thus appears 

the cjuasi-explicit method, as regards matrix assembly and inversion, could be 

far more economical, its competitiveness overall then depending largely 

* 

upon the number of times is required per time step. 

For the large systems of equations which appear particularly in fluid 
dynamics applications both classes of methods are further encimbered by the 
necessity of out-of-core equation— solving strategies. (This may prove to be 
a greater liability for implicit methods . ) The frontal technique of 
Irons (ref. 25) , based on direct Gaussian elimination without pivoting and 
accomplished with only portions of the matrices assembled at any one time. 
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has proved an effective solving strategy for syinmetric positive definite 
systems. However, in many cases the matrices of equations (A-2) are 
unsymmetric; for such Hood (ref. 26) has generalized the frontal technique 
to allow nonsymmetry and some degree of pivoting. In this section is described 
an" alternate- equation— solving' choice, the split-Cholesky^ strategy for banded 
matrices. This method is characterized by much simpler programming, in 
comparison, and for large systems should further close the gap between 
implicit and quasi-explicit integration methods. It is particiilarly well 
suited in the circumstance that the triangular decomposition of A is stored 
on disk, with frontsolves and backsolves whenever is needed. For problems 

governed by the Navier-Stokes equations further economy in storage occiirs 
when boundary conditions allow the same mass matrix for more than one set 
of dependent variables, such as often happens for the momentum equations in 
two-dimensional flows . 


THE SPLIT-CHOAESKY PHILOSOPHY 


The basic idea involved in the split-Cholesky equation- solving strategy 
is that for a banded matrix the computation can be carried out in pieces, 
with only a small portion of the matrix residing in core. The essential 
concepts will be viewed from the position of the analyst familiar with band 
matrix terminology. 

First of all, the band structure inherited by the nonzero portions of 

the triangular matrices associated with the LU decomposition of a matrix A 

is_,identical to the band structure of A . This fact and the nature of the 

decomposition sequence permit the decomposition to be stored back on top of 

A as it is performed. Second, if a is symmetric positive definite, then 
. T 

L is the transpose of U{L = U ) ; hence, only one triangular component must 
be computed. If U is chosen, only the diagonal and upper triangular portion 
of. the matrix A of equations (A-3) must be assembled- The computation 
proceeds in stages, with the elements of A to be modified at stage p 
altered according to the formulas: 
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If A has bandwidth 2m+l we see that only rows p through p+in 
of A need be available in order to accomplish stage p of the decomposition 
(see diagram 1) . 

From these considerations it is clear that the Cholesky decomposition 

of a band matrix can be split into several passes. At each pass a minimum 

of p+m rows of A must be available in order to perform stage p of the 

decomposition, with row p the operating row. After this row p_ is no longer 

needed; it can be placed on auxiliary storage- Of course, depending upon the 

size of m and the buffer requirements desired, one would normally decompose 

stages p-i through p and write Jl+1 rows of U into a single record. The 

value of a could be related for purposes of convenience to the most logical 

number of mesh elements to be assembled at one pass, as discussed in the next 

section. At the same time a record containing several rows of U is completed, 

T 

the corresponding rows of L(= U ) which can be made from these rows of U 
are also assembled, and a record containing several rows of L can also be 
stored. (As it turns out, portions of the same niomber of rows of L as there 
exist available rows of D can be formed- ) 


MESH CONSIDERATIONS 


In order to complete the formulation of the split band-Ghoiesky strategy 
it is perhaps helpful to demonstrate the manner in which the assembly of the 
finite element equations proceeds. The illustration will be accomplished 
under the assumption of linear trial functions on a triangular mesh. For 
simplicity the mesh will be obtained from the uniform subdivision of a 
rectangular region; however, the same ideas hold valid on any mesh whose nodal 
connections have equivalent topology, (For example, a uniform mesh on a 
rectangular region can be mapped topologically (and analytically) onto a 
lionuniform mesh on a nonrectangular region) . 
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Such mappings are commonly used in computational fluid dynamics (ref. 18) , 
and provide a workable device for triangulating a nonregular region in a 
regular manner so that design of the mesh generator for finite element programs 
becomes less biurdensome. 

Consider the triangulated mesh of diagram 2, possessing NXG (= 8) nodes 

per line of mesh, and leading to mass matrices of semibandwidth m = NXG . 

Normally the finite element assembly precedes triangle by triangle, with all 

matrix contributions associated with the unknown (nodal) variables of a 

triangle to be computed and stored when this triangle is processed. For the 

split-Cholesky method the assembly is to be'’ completed in several passes, with 

NL layers of triangles processed at each pass (NL ^ 2) . Each assembly pass 

is followed by a decomposition pass, in which the decomposition stage (p) is 

completed for as many nodal variables x , as possible, whose equation is 

fully summed (completed assembled) during the assembly pass. (At the end of 

the assembly pass the level of nodes at the front of the pass is only partially 

summed and thus cannot be processed on this decomposition pass. The preceding 

level cannot be processed for the same reason, since a decomposition stage 

at this level must necessarily partially process the nodal equations NXG 

stages in advance, or in the next level of partially summed nodes.) At the 

end of the decomposition pass a record containing NL+1 rows of U is 

T 

written disk, together with a record of corresponding rows of L(= U ■) . 

(Some shuffling must be done to account for partial rows of L available but 
not permitted yet to be stored.) Irregular length records occur on the first 
and possibly on the last pass. 
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Diagram One, The reduction of a band matrix - 
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NXG = number nodes per 
mesh level 
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assembled at 
one pass 
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/at end of decomposition 
^ pass. 
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processed at end of 
Ldecomposition pass 
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Diagram Two. Mesh Processing Considerations, 
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